Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for expver dimension #56

Merged
merged 2 commits into from
Oct 5, 2023
Merged

Fix for expver dimension #56

merged 2 commits into from
Oct 5, 2023

Conversation

DarshanSP19
Copy link
Collaborator

Added the fix for expver dimension for recent months data.

@DarshanSP19 DarshanSP19 self-assigned this Sep 25, 2023
@DarshanSP19 DarshanSP19 requested a review from alxmrs September 25, 2023 12:18
@DarshanSP19
Copy link
Collaborator Author

Hey @alxmrs We for the majority of the data files this fix seems working (Verified after back filling the data). But following two variables are hitting the assertion assert disjoint_nans, "The nans are not disjoint in expver=1 vs 5". It seems like data itself is inappropriate.

cloud_base_height
convective_inhibition

Copy link
Collaborator

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. If this has been tested on realistic data, than it makes sense to me.

@@ -372,12 +372,12 @@ def _read_nc_dataset(gpath_file):
# and: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Dataupdatefrequency # pylint: disable=line-too-long
# for further details.

all_dims_except_time = tuple(set(dataarray.dims) - {"time"})
all_dims_except_time = tuple(set(dataarray.dims) - {"time", "expver"})
# Should have only trailing nans.
a = dataarray.sel(expver=1).isnull().any(dim=all_dims_except_time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Alvaro ([email protected]), the fix should be:

dataarray.sel(expver=1, drop=True)

@@ -372,12 +372,12 @@ def _read_nc_dataset(gpath_file):
# and: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Dataupdatefrequency # pylint: disable=line-too-long
# for further details.

all_dims_except_time = tuple(set(dataarray.dims) - {"time"})
all_dims_except_time = tuple(set(dataarray.dims) - {"time", "expver"})
# Should have only trailing nans.
a = dataarray.sel(expver=1).isnull().any(dim=all_dims_except_time)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the reason why this was not working is that dataarray.sel(expver=1) and dataarray.sel(expver=5),
even though they no longer have the "expver" dim, they both remember which expver coordinate they came from (they keep a scalar expver coordinate), and then they complain when running arithmetics between them because the scalar coordinate don't match. Then by adding "expver" to the dimensions to reduce the scalar coordinate diappears too. However this is unintuitive, because the propose code tries to reduce across a dim, which no longer exists.

I believe the right fix, rather than setting:
all_dims_except_time = tuple(set(dataarray.dims) - {"time", "expver"})

should be to use:

dataarray.sel(expver=1, drop=True)
dataarray.sel(expver=5, drop=True)

If you confirm this alternative fix works, this would be preferable.

I think the right fix for this should be:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking more with Alvaro here: I think these errors only crop up if we download ERA5T, not ERA5. If we simply updated the data, or put limits into how recently we were updating the data, I believe these issues would go away.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've downloaded the data for 2023 on 21-22 September. Will that cause any trouble?
Tried below snipped giving an error.

all_dims_except_time = tuple(set(dataarray.dims) - {"time"})
a = dataarray.sel(expver=1, drop=True).isnull().any(dim=all_dims_except_time)
b = dataarray.sel(expver=5, drop=True).isnull().any(dim=all_dims_except_time)
disjoint_nans = bool(next(iter((a ^ b).all().data_vars.values())))
assert disjoint_nans, "The nans are not disjoint in expver=1 vs 5"
dataarray = dataarray.sel(expver=1).combine_first(dataarray.sel(expver=5))

Error

ValueError: 'expver' not found in array dimensions ('time', 'latitude', 'longitude')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you indicate which line is throwing the error?

Perhaps the problem is that this file has an "expver" coordinate, but not an "expver" dim, which is something we have not found before, but can probably be solved by changing:

if "expver" in dataarray.coords:
   # previous code

to

if "expver" in dataarray.dims:
  # previous code
elif "expver" in dataarray.coords:
  dataarray = dataarray.drop_vars('expver')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've downloaded the data for 2023 on 21-22 September. Will that cause any trouble?

In short, yes I do think this is the root issue. Our goal with keeping data up-to-date is to have a ~1-2 month delay in freshness exactly for this reason. Unless we plan to re-download and update ERA5T data to ERA5 (I don't recommend this), then it would be best to change the interval at which we ingest data to avoid the case that this PR is trying to solve for.

@DarshanSP19
Copy link
Collaborator Author

@alxmrs After verifying the code for April, May and June 2023 data, It seems that the changes are required for data ingestion. As the data for April and May also downloaded on 21st September so it should be fresh enough for the process further.

Copy link
Collaborator

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alxmrs
Copy link
Collaborator

alxmrs commented Oct 4, 2023

Thanks for the discussion in this PR.

@DarshanSP19 DarshanSP19 merged commit f1327ae into main Oct 5, 2023
3 checks passed
@DarshanSP19 DarshanSP19 deleted the fix-expver-dimension-issue branch October 5, 2023 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants